[1] ".GlobalEnv" "package:readxl" "package:gt" "package:plotly"
[5] "package:pls" "package:glmnet" "package:Matrix" "package:class"
[9] "package:nnet" "package:pROC" "package:boot" "package:leaps"
[13] "package:car" "package:carData" "package:faraway" "package:ISLR"
[17] "package:FNN" "package:vcd" "package:grid" "package:fitdistrplus"
[21] "package:survival" "package:MASS" "package:stats4" "package:haven"
[25] "package:pracma" "package:matlib" "package:forcats" "package:stringr"
[29] "package:dplyr" "package:purrr" "package:readr" "package:tidyr"
[33] "package:tibble" "package:ggplot2" "package:tidyverse" "package:knitr"
[37] "package:stats" "package:graphics" "package:grDevices" "package:utils"
[41] "package:datasets" "package:methods" "Autoloads" "package:base"
ALL NBA TEAM of the Year is an annual NBA award given to the best players of the season. Voting is by a group of sports journalists and announcers from the United States and Canada. The team has been chosen in each NBA season, since its inauguration in 1946. The award consists of three quintets consisting of a total of 15 players, five on each team. It originally consisted of two teams, but in 1988 it was increased to three.
Players receive five points for each vote on the first team, three points for each vote on the second team, and one point for each vote on the third team. The five players with the highest total number of points enter the first team, with the next five players integrating the second team and the same with the third. There is a position restriction. In each voting of 5 players (of each quintet), 2 players are voted that are guard, in our data.frame “PG” and “SG”; the other 2 being forward, “SF” and “PF” and the last player being center “C”. They are basically the top 15 players of the season. We will look at the statistics made by these 15 players (in all seasons), and with this information, having the statistics of all the players of the last 35 seasons, we will try to know which will be in the quintet and which not in recent years.
Our goal is to make a model that is able to predict these 15 players. To do this we need to make a multiple logistic regression model where, according to each player’s statistics, the model gives you a chance to belong to the best quintets. We need to keep in mind the restriction that only certain players in each position can have. We have a database, explained more specifically in point 2. INTRODUCTION TO THE DATABASE where we will have all the information to create this model. We will check the results obtained by the model in some of the last seasons where we already know the results, then we will test it with the statistics we have so far to predict this year’s quintets.
Our information for making this model is divided into two databases: - A database where we have all the information from 1980 to 2017 with all the player statistics for each season. We also introduce the quintetO variable that will provide us with information on whether or not the player is in the quintet of the season. - And another where we have the information of the data of the current season. In the first database we have the statistics of each player in a given season. The data have the following variables:
|Variables explained in the document|
We will divide the process into two parts:
On the one hand we have the creation of the model with a database that contains all the statistics from 1980 to 2017. In this, we will predict the quintets of the years 2015, 2016 and 2017. In the study we will be able to check which players have predicted correctly, and which ones don’t.
In the second part we will take the data available this year from the competition to the stop due to the COVID-19. In this one we will not be able to check if the model hits the players or not as this prize has not been awarded yet. We will also work on a data.frame that does not contain the same variables (although it is similar) so we will have to create another model.
The steps we have followed are as follows:
#Read csv file
library(readr)
Seasons_Stats <- read_csv("Seasons_Stats.csv",
col_types = cols(`2P%` = col_number(),
`3P` = col_double(), `3PAr` = col_number(),
`AST%` = col_number(), BLK = col_double(),
`BLK%` = col_number(), BPM = col_double(),
DBPM = col_double(), DRB = col_double(),
`DRB%` = col_number(), `FT%` = col_number(),
FTr = col_number(), GS = col_double(),
MP = col_double(), OBPM = col_double(),
ORB = col_double(), `ORB%` = col_number(),
PER = col_double(), STL = col_double(),
`STL%` = col_number(), `TOV%` = col_number(),`3P%` = col_double() ,
`3PA` = col_double(), `TOV` = col_double(),
TRB = col_double(), `TRB%` = col_number(),
`TS%` = col_number(), `USG%` = col_number(),
VORP = col_double(), `WS/48` = col_number(),
X1 = col_skip(), Year = col_integer(),
blank2 = col_skip(), blanl = col_skip(),
`eFG%` = col_number()))
#We select the data from 80s, when three-point shot is introduced in the NBA.
library(tidyverse)
Seasons_Stats2 <- Seasons_Stats %>% subset(Year > 1979)
#What we want to do is find which players are most likely to be in the best quintet. We will do this based on a training set and a test set.To do this, we will then nail the all_nba team (quinteto) variable, followd by the year (quinteto_year), which will have 1 (if the player is in a better all nba team) and 0 if it is not. In this way, we enter the data, and then we enter the column.
#Those players who have a * in the database are those who have been elected to the Basketball Hall of Fame. We also have it in the database.
quinteto80 <- c("Julius Erving*","Dan Roundfield","Larry Bird*","Marques Johnson","Kareem Abdul-Jabbar*","Moses Malone*","George Gervin*","Dennis Johnson*","Paul Westphal","Gus Williams")
quinteto81 <- c("Julius Erving*","Marques Johnson", "Larry Bird*","Adrian Dantley*","Kareem Abdul-Jabbar*","Moses Malone*","George Gervin*","Otis Birdsong","Dennis Johnson*","Tiny Archibald*")
quinteto82 <- c("Larry Bird*","Alex English*","Julius Erving*","Bernard King*","Moses Malone*","Robert Parish*", "George Gervin*","Magic Johnson*", "Gus Williams","Sidney Moncrief")
quinteto83 <- c("Larry Bird*","Alex English*" ,"Julius Erving*","Buck Williams","Moses Malone*","Kareem Abdul-Jabbar*", "Magic Johnson*","George Gervin*","Sidney Moncrief","Isiah Thomas*")
quinteto84 <- c("Larry Bird*","Adrian Dantley*","Bernard King*" ,"Julius Erving*","Kareem Abdul-Jabbar*","Moses Malone*" ,"Magic Johnson*","Sidney Moncrief" ,"Isiah Thomas*","Jim Paxson")
quinteto85 <- c("Larry Bird*","Terry Cummings","Bernard King*","Ralph Sampson*","Moses Malone*", "Kareem Abdul-Jabbar*","Magic Johnson*","Michael Jordan*","Isiah Thomas*","Sidney Moncrief")
quinteto86 <- c("Larry Bird*","Charles Barkley*","Dominique Wilkins*","Alex English*","Kareem Abdul-Jabbar*","Hakeem Olajuwon*","Magic Johnson*","Sidney Moncrief","Isiah Thomas*","Alvin Robertson")
quinteto87 <- c("Larry Bird*" ,"Dominique Wilkins*","Kevin McHale*","Charles Barkley*","Hakeem Olajuwon*", "Moses Malone*","Magic Johnson*","Isiah Thomas*","Michael Jordan*","Fat Lever")
quinteto88 <- c("Larry Bird*","Karl Malone*","Charles Barkley*","Dominique Wilkins*","Hakeem Olajuwon*", "Patrick Ewing*","Michael Jordan*","Clyde Drexler*","Magic Johnson*","John Stockton*")
quinteto89 <- c("Karl Malone*","Tom Chambers","Dominique Wilkins*", "Charles Barkley*","Chris Mullin*", "Terry Cummings","Hakeem Olajuwon*","Patrick Ewing*","Robert Parish*","Michael Jordan*" ,"John Stockton*","Dale Ellis","Magic Johnson*","Kevin Johnson","Mark Price")
quinteto90 <- c("Karl Malone*","Larry Bird*","James Worthy*","Charles Barkley*","Tom Chambers","Chris Mullin*","Patrick Ewing*","Hakeem Olajuwon*","David Robinson*","Magic Johnson*","John Stockton*","Clyde Drexler*","Michael Jordan*","Kevin Johnson","Joe Dumars*")
quinteto91 <- c("Karl Malone*","Dominique Wilkins*","James Worthy*","Charles Barkley*","Chris Mullin*","Bernard King*","David Robinson*","Patrick Ewing*","Hakeem Olajuwon*","Michael Jordan*" ,"Kevin Johnson","John Stockton*","Magic Johnson*","Clyde Drexler*","Joe Dumars*")
quinteto92 <- c("Karl Malone*","Scottie Pippen*","Dennis Rodman*","Chris Mullin*","Charles Barkley*","Kevin Willis","David Robinson*","Patrick Ewing*","Brad Daugherty","Michael Jordan*","Tim Hardaway","Mark Price", "Clyde Drexler*","John Stockton*","Kevin Johnson")
quinteto93 <- c("Charles Barkley*","Dominique Wilkins*","Scottie Pippen*","Karl Malone*","Larry Johnson", "Derrick Coleman","Hakeem Olajuwon*","Patrick Ewing*","David Robinson*","Michael Jordan*","John Stockton*","Tim Hardaway","Mark Price","Joe Dumars*","Drazen Petrovic*")
quinteto94 <- c("Scottie Pippen*","Shawn Kemp","Derrick Coleman","Karl Malone*","Charles Barkley*","Dominique Wilkins*","Hakeem Olajuwon*","David Robinson*","Shaquille O'Neal*","John Stockton*","Mitch Richmond*","Mark Price","Latrell Sprewell","Kevin Johnson","Gary Payton*")
quinteto95 <- c("Karl Malone*","Charles Barkley*","Detlef Schrempf","Scottie Pippen*","Shawn Kemp","Dennis Rodman*","David Robinson*","Shaquille O'Neal*","Hakeem Olajuwon*","John Stockton*","Gary Payton*","Reggie Miller*","Anfernee Hardaway","Mitch Richmond*","Clyde Drexler*")
quinteto96 <- c("Scottie Pippen*","Shawn Kemp","Charles Barkley*","Karl Malone*","Grant Hill","Juwan Howard","David Robinson*","Hakeem Olajuwon*","Shaquille O'Neal*","Michael Jordan*","Gary Payton*","Mitch Richmond*","Anfernee Hardaway","John Stockton*" ,"Reggie Miller*")
quinteto97 <- c("Karl Malone*" ,"Scottie Pippen*" ,"Anthony Mason","Grant Hill","Glen Rice","Vin Baker", "Hakeem Olajuwon*","Patrick Ewing*","Shaquille O'Neal*" ,"Michael Jordan*" ,"Gary Payton*","John Stockton*","Tim Hardaway","Mitch Richmond*" ,"Anfernee Hardaway")
quinteto98 <- c("Karl Malone*","Grant Hill","Scottie Pippen*","Tim Duncan","Vin Baker","Glen Rice","Shaquille O'Neal*","David Robinson*","Dikembe Mutombo*","Michael Jordan*","Tim Hardaway","Mitch Richmond*" ,"Gary Payton*","Rod Strickland","Reggie Miller*")
quinteto99 <- c("Karl Malone*","Chris Webber" ,"Kevin Garnett","Tim Duncan", "Grant Hill" ,"Antonio McDyess" ,"Alonzo Mourning*","Shaquille O'Neal*","Hakeem Olajuwon*","Allen Iverson*","Gary Payton*","Kobe Bryant","Jason Kidd","Tim Hardaway","John Stockton*")
quinteto00 <- c("Tim Duncan","Karl Malone*","Chris Webber","Kevin Garnett","Grant Hill","Vince Carter","Shaquille O'Neal*","Alonzo Mourning*" ,"David Robinson*","Jason Kidd","Allen Iverson*","Eddie Jones","Gary Payton*","Kobe Bryant","Stephon Marbury")
quinteto01 <- c("Tim Duncan","Kevin Garnett","Karl Malone*","Chris Webber","Vince Carter","Dirk Nowitzki","Shaquille O'Neal*","Dikembe Mutombo*","David Robinson*","Allen Iverson*","Kobe Bryant","Gary Payton*" ,"Jason Kidd","Tracy McGrady","Ray Allen")
quinteto02 <- c("Tim Duncan","Kevin Garnett","Ben Wallace","Tracy McGrady","Chris Webber","Jermaine O'Neal","Shaquille O'Neal*","Dirk Nowitzki","Dikembe Mutombo*","Jason Kidd","Gary Payton*","Paul Pierce", "Kobe Bryant","Allen Iverson*","Steve Nash")
quinteto03 <- c("Tim Duncan","Dirk Nowitzki","Paul Pierce","Kevin Garnett","Chris Webber" ,"Jamal Mashburn","Shaquille O'Neal*","Ben Wallace","Jermaine O'Neal","Kobe Bryant","Jason Kidd","Stephon Marbury","Tracy McGrady","Allen Iverson*","Steve Nash")
quinteto04 <- c("Kevin Garnett","Jermaine O'Neal","Dirk Nowitzki","Tim Duncan","Peja Stojakovicn","Ron Artest","Shaquille O'Neal*","Ben Wallace","Yao Ming*","Kobe Bryant","Sam Cassell","Michael Redd","Jason Kidd","Tracy McGrady","Baron Davis")
quinteto05 <- c("Tim Duncan","LeBron James","Tracy McGrady","Dirk Nowitzki","Kevin Garnett","Shawn Marion", "Shaquille O'Neal*","Amar'e Stoudemire","Ben Wallace","Allen Iverson*","Dwyane Wade","Kobe Bryant","Steve Nash","Ray Allen","Gilbert Arenas")
quinteto06 <- c("LeBron James","Elton Brand","Shawn Marion","Dirk Nowitzki","Tim Duncan","Carmelo Anthony", "Shaquille O'Neal*","Ben Wallace","Yao Ming*","Kobe Bryant","Chauncey Billups","Allen Iverson*","Steve Nash","Dwyane Wade","Gilbert Arenas")
quinteto07 <- c("Dirk Nowitzki","LeBron James","Kevin Garnett","Tim Duncan","Chris Bosh","Carmelo Anthony","Amar'e Stoudemire","Yao Ming*","Dwight Howard","Steve Nash","Gilbert Arenas","Dwyane Wade","Kobe Bryant","Tracy McGrady","Chauncey Billups")
quinteto08 <- c("Kevin Garnett","Dirk Nowitzki","Carlos Boozer","LeBron James","Tim Duncan" ,"Paul Pierce", "Dwight Howard","Amar'e Stoudemire","Yao Ming*","Kobe Bryant","Steve Nash","Tracy McGrady","Chris Paul","Deron Williams","Manu Ginobili")
quinteto09 <- c("LeBron James","Tim Duncan","Carmelo Anthony","Dirk Nowitzki","Paul Pierce","Pau Gasol","Dwight Howard","Yao Ming*","Shaquille O'Neal*","Kobe Bryant","Brandon Roy","Chauncey Billups", "Dwyane Wade","Chris Paul","Tony Parker")
quinteto10 <- c("LeBron James","Dirk Nowitzki","Brandon Roy","Kevin Durant","Steve Nash","Pau Gasol","Dwight Howard","Amar'e Stoudemire","Andrew Bogut","Kobe Bryant","Carmelo Anthony","Tim Duncan", "Dwyane Wade","Deron Williams","Joe Johnson")
quinteto11 <- c("LeBron James","Dirk Nowitzki","LaMarcus Aldridge","Kevin Durant","Amar'e Stoudemire","Zach Randolph","Dwight Howard","Pau Gasol","Al Horford", "Kobe Bryant","Dwyane Wade","Manu Ginobili","Derrick Rose","Russell Westbrook","Chris Paul")
quinteto12 <- c("LeBron James","Kevin Love","Carmelo Anthony","Kevin Durant","Blake Griffin","Dirk Nowitzki","Dwight Howard","Andrew Bynum","Tyson Chandler","Kobe Bryant","Tony Parker","Dwyane Wade","Chris Paul","Russell Westbrook","Rajon Rondo")
quinteto13 <- c("LeBron James","Carmelo Anthony","Paul George","Kevin Durant","Blake Griffin","David Lee", "Tim Duncan","Marc Gasol","Dwight Howard","Kobe Bryant","Tony Parker","Dwyane Wade","Chris Paul","Russell Westbrook","James Harden")
quinteto14 <- c("Kevin Durant","Blake Griffin","Paul George","LeBron James","Kevin Love","LaMarcus Aldridge","Joakim Noah","Dwight Howard","Al Jefferson","James Harden","Stephen Curry","Goran Dragic","Chris Paul","Tony Parker","Damian Lillard")
quinteto15 <- c("LeBron James","LaMarcus Aldridge","Blake Griffin","Anthony Davis","DeMarcus Cousins","Tim Duncan","Marc Gasol","Pau Gasol","DeAndre Jordan","James Harden","Russell Westbrook","Klay Thompson","Stephen Curry","Chris Paul","Kyrie Irving")
quinteto16 <- c("Kawhi Leonard","Kevin Durant","Paul George","LeBron James","Draymond Green","LaMarcus Aldridge","DeAndre Jordan","DeMarcus Cousins","Andre Drummond","Stephen Curry","Damian Lillard","Klay Thompson","Russell Westbrook","Chris Paul","Kyle Lowry")
quinteto17 <- c("Kawhi Leonard","Kevin Durant","Jimmy Butler","LeBron James","Giannis Antetokounmpo","Draymond Green","Anthony Davis","Rudy Gobert","DeAndre Jordan","James Harden","Stephen Curry","John Wall","Russell Westbrook","Isaiah Thomas","DeMar DeRozan")
quinteto20 <-c('Guillem Miralles','Miguel Payá')
#With this function what we do is introduce the all nba team of each year and the year to which it corresponds. Our function what it does is return a data.frame with the new column (quinteto) indicating 1 if the player is in the quintet, and 0 if it is not. Then we will join all the dat.frame in order to get the data for all the seasons.
funcion_quinteto <- function(quinteto_df,any){
df <- Seasons_Stats2 %>% subset(Year == any)
quinteto <- 0
for (i in (1:length(df$Player))) {
if (df$Player[i] %in% quinteto_df){
quinteto[i] = 1
}else{
quinteto[i] = 0
}
}
df_any<- data.frame(quinteto,df)
return(df_any)
}
#We apply the function for each year:
df_any80 <- funcion_quinteto(quinteto80,1980)
df_any81 <- funcion_quinteto(quinteto81,1981)
df_any82 <- funcion_quinteto(quinteto82,1982)
df_any83 <- funcion_quinteto(quinteto83,1983)
df_any84 <- funcion_quinteto(quinteto84,1984)
df_any85 <- funcion_quinteto(quinteto85,1985)
df_any86 <- funcion_quinteto(quinteto86,1986)
df_any87 <- funcion_quinteto(quinteto87,1987)
df_any88 <- funcion_quinteto(quinteto88,1988)
df_any89 <- funcion_quinteto(quinteto89,1989)
df_any90 <- funcion_quinteto(quinteto90,1990)
df_any91 <- funcion_quinteto(quinteto91,1991)
df_any92 <- funcion_quinteto(quinteto92,1992)
df_any93 <- funcion_quinteto(quinteto93,1993)
df_any94 <- funcion_quinteto(quinteto94,1994)
df_any95 <- funcion_quinteto(quinteto95,1995)
df_any96 <- funcion_quinteto(quinteto96,1996)
df_any97 <- funcion_quinteto(quinteto97,1997)
df_any98 <- funcion_quinteto(quinteto98,1998)
df_any99 <- funcion_quinteto(quinteto99,1999)
df_any00 <- funcion_quinteto(quinteto00,2000)
df_any01 <- funcion_quinteto(quinteto01,2001)
df_any02 <- funcion_quinteto(quinteto02,2002)
df_any03 <- funcion_quinteto(quinteto03,2003)
df_any04 <- funcion_quinteto(quinteto04,2004)
df_any05 <- funcion_quinteto(quinteto05,2005)
df_any06 <- funcion_quinteto(quinteto06,2006)
df_any07 <- funcion_quinteto(quinteto07,2007)
df_any08 <- funcion_quinteto(quinteto08,2008)
df_any09 <- funcion_quinteto(quinteto09,2009)
df_any10 <- funcion_quinteto(quinteto10,2010)
df_any11 <- funcion_quinteto(quinteto11,2011)
df_any12 <- funcion_quinteto(quinteto12,2012)
df_any13 <- funcion_quinteto(quinteto13,2013)
df_any14 <- funcion_quinteto(quinteto14,2014)
df_any15 <- funcion_quinteto(quinteto15,2015)
df_any16 <- funcion_quinteto(quinteto16,2016)
df_any17 <- funcion_quinteto(quinteto17,2017)
#We unite in order to have all the seasons:
#We set up a training set and a test set. The training set will be stored until 2011, while the test set will be used from 2012 to 2017.
bd <- rbind(df_any80,df_any81,df_any82,df_any83,df_any84,df_any85,df_any86,df_any87,df_any88,df_any89,
df_any90,df_any91,df_any92,df_any93,df_any94,df_any95,df_any96,df_any97,df_any98,df_any99,
df_any00,df_any01,df_any02,df_any03,df_any04,df_any05,df_any06,df_any07,df_any08,df_any09,
df_any10,df_any11)
bdpredict<-rbind(df_any12,df_any13,df_any14,df_any15,df_any16,df_any17)
#We remove the NULL values and na's
bd[is.na(bd)] = 0
bdpredict[is.na(bdpredict)]= 0
#In the data we have a problem. There are players who are in the all nba team, who in the middle of the season have been transferred to another team. So they have stats from 2 different teams. The database already incorporates the sum of these two statistics, so we have 3 rows with the player's statistics. We are only interested in the total data for the season. The players in this situation are: Dominique Wiklins (1994), Clyde Draxler (1995), Dikembe Mutombo (2001) and Chauncey Billups (2009). We eliminate the two rows that do not interest us in each player (those of the teams).
bd <- bd[-c(6069,6070,6203,6204,9568,9569,13639,13640),]
df_any17 <-df_any17[-c(112,113),]
#We begin the visualization of the variables. Already at first glance if we look at the variables, there are many that can be correlated as they explain the same thing. To see it more clearly, let's look at some of them:
attach(bd)
pairs( X3P ~ X3PA + X3P.)
pairs(X2P ~ X2PA + X2P.)
pairs(WS ~ OWS + DWS + WS.48)
pairs( PTS ~ FG + FT)
pairs(BPM ~ DBPM + OBPM)
pairs(FGA ~ X3PA + X2PA)
#We can observe that many of the variables present us with information that is not entirely relevant. To know which ones we are going to use, we are going to use reduction techniques that help us find the best variables for our model.
library(glmnet)
x <- model.matrix(quinteto~ Pos+Age+G+MP+PER+TS.+X3PAr+FTr+ORB.+TRB.+AST.+STL.+BLK.+TOV.+USG.+OWS+DWS+WS+WS.48+OBPM+DBPM+BPM+VORP+FG+FGA+FG.+X3P+X3PA+X3P.+X2P+X2PA+X2P.+eFG.+FT+FTA+FT.+PF+DRB.+ORB+TRB+DRB+AST+STL+BLK+TOV+PTS,bd)[,-1]
y <- bd$quinteto
lambdas <- 10^seq(5,-5,length=100)
set.seed(12345)
cv.lasso.NBA <- cv.glmnet(x,y,alpha=1,lambda=lambdas)
plot(cv.lasso.NBA)
cv.lasso.NBA$lambda.1se
[1] 0.001047616
lasso.final <- glmnet(x,y,alpha=1,lambda =round(cv.lasso.NBA$lambda.1se,3))
coef(lasso.final)[coef(lasso.final)[,1] !=0,]
(Intercept) PosPF PosSF Age G MP TS.
2.222875e-02 -1.436547e-03 -3.706560e-03 3.249107e-04 7.727267e-04 -1.716349e-04 -4.249438e-02
FTr STL. TOV. USG. DWS WS BPM
-1.607393e-02 -7.022702e-04 2.898649e-04 -2.537874e-04 6.191524e-03 1.821565e-02 -4.986440e-04
VORP FG FGA X2P FTA PF ORB
3.161288e-02 2.369019e-05 1.295914e-04 6.016663e-05 3.683313e-04 -3.212804e-04 -1.507431e-04
DRB AST STL BLK
1.636141e-04 1.907179e-04 -9.944877e-05 3.400609e-04
# We observe that these variables are the ones that the Lasso method indicates to us that they are more explanatory, since they have different coefficients from 0.
We see how the variables we are interested in are greatly reduced. As we are performing a logistic regression, in the variables we obtained from the previous point, we perform three models using three different methods which are the ones we will compare. These three methods are: Multiple Logistic Regression (GLM), Quadratic Discriminant Analysis (QDA), and Linear Discriminant Analysis (LDA). We do not take the KNN method because we already know that neighboring values are not interesting for predicting the next value.
We make comparisons between the models and look at the following results to choose the one that interests us most.
glmnba <- glm(quinteto ~ Pos+ Age + G + MP + TS. + FTr + STL. + TOV. + USG. +DWS + WS + BPM + VORP + FG + FGA + X2P + FTA + PF + ORB + DRB + AST + STL + BLK, data = bd, family = 'binomial')
summary(glmnba)
Call:
glm(formula = quinteto ~ Pos + Age + G + MP + TS. + FTr + STL. +
TOV. + USG. + DWS + WS + BPM + VORP + FG + FGA + X2P + FTA +
PF + ORB + DRB + AST + STL + BLK, family = "binomial", data = bd)
Deviance Residuals:
Min 1Q Median 3Q Max
-3.5398 -0.0255 -0.0097 -0.0032 3.1858
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -9.895e+00 1.979e+00 -4.999 5.76e-07 ***
PosC-PF -8.805e+00 1.146e+03 -0.008 0.993867
PosC-SF -5.389e+00 6.523e+03 -0.001 0.999341
PosPF -2.028e-01 3.279e-01 -0.618 0.536389
PosPF-C -1.042e+01 1.187e+03 -0.009 0.992995
PosPF-SF -8.776e+00 1.345e+03 -0.007 0.994794
PosPG 4.628e-01 6.661e-01 0.695 0.487173
PosPG-SF -1.081e+01 6.523e+03 -0.002 0.998678
PosPG-SG -1.095e+01 1.138e+03 -0.010 0.992323
PosSF -6.682e-02 4.532e-01 -0.147 0.882787
PosSF-PF -1.016e+01 1.376e+03 -0.007 0.994111
PosSF-SG -1.421e+01 9.751e+02 -0.015 0.988372
PosSG 8.679e-01 5.495e-01 1.579 0.114265
PosSG-PF -1.334e+01 3.341e+03 -0.004 0.996815
PosSG-PG -1.414e+01 1.172e+03 -0.012 0.990373
PosSG-SF -9.913e+00 1.070e+03 -0.009 0.992606
Age 5.493e-02 2.668e-02 2.059 0.039498 *
G -1.427e-01 2.310e-02 -6.176 6.59e-10 ***
MP -4.875e-05 6.237e-04 -0.078 0.937701
TS. -3.015e+00 2.827e+00 -1.066 0.286216
FTr 9.998e-01 6.818e-01 1.466 0.142559
STL. -2.864e-01 2.604e-01 -1.100 0.271476
TOV. 6.865e-02 2.689e-02 2.553 0.010692 *
USG. 1.094e-01 2.931e-02 3.732 0.000190 ***
DWS 4.977e-01 1.207e-01 4.123 3.75e-05 ***
WS 8.476e-01 9.517e-02 8.907 < 2e-16 ***
BPM 1.696e-01 4.532e-02 3.742 0.000183 ***
VORP -6.077e-01 1.610e-01 -3.776 0.000160 ***
FG 1.116e-02 6.547e-03 1.704 0.088382 .
FGA 1.356e-03 2.131e-03 0.636 0.524552
X2P -4.430e-03 3.174e-03 -1.396 0.162822
FTA 3.216e-04 1.220e-03 0.264 0.792056
PF -6.412e-03 2.603e-03 -2.463 0.013772 *
ORB 5.040e-03 2.085e-03 2.417 0.015657 *
DRB 1.462e-03 1.309e-03 1.117 0.263990
AST 5.366e-03 1.125e-03 4.768 1.86e-06 ***
STL 2.717e-03 5.768e-03 0.471 0.637620
BLK 8.365e-03 2.610e-03 3.204 0.001354 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 3944.24 on 15359 degrees of freedom
Residual deviance: 894.74 on 15322 degrees of freedom
AIC: 970.74
Number of Fisher Scoring iterations: 17
step(glmnba)
Start: AIC=970.74
quinteto ~ Pos + Age + G + MP + TS. + FTr + STL. + TOV. + USG. +
DWS + WS + BPM + VORP + FG + FGA + X2P + FTA + PF + ORB +
DRB + AST + STL + BLK
Df Deviance AIC
- Pos 15 906.56 952.56
- MP 1 894.75 968.75
- FTA 1 894.81 968.81
- STL 1 894.98 968.98
- FGA 1 895.14 969.14
- FTr 1 895.86 969.86
- TS. 1 895.88 969.88
- DRB 1 895.99 969.99
- X2P 1 896.70 970.70
<none> 894.74 970.74
- STL. 1 896.93 970.93
- FG 1 897.70 971.70
- Age 1 898.98 972.98
- TOV. 1 899.58 973.58
- ORB 1 900.60 974.60
- PF 1 900.85 974.85
- USG. 1 903.61 977.61
- BLK 1 905.05 979.05
- BPM 1 907.97 981.97
- VORP 1 909.07 983.07
- DWS 1 912.32 986.32
- AST 1 919.51 993.51
- G 1 937.49 1011.49
- WS 1 981.81 1055.81
Step: AIC=952.56
quinteto ~ Age + G + MP + TS. + FTr + STL. + TOV. + USG. + DWS +
WS + BPM + VORP + FG + FGA + X2P + FTA + PF + ORB + DRB +
AST + STL + BLK
Df Deviance AIC
- MP 1 906.58 950.58
- DRB 1 906.67 950.67
- FTA 1 906.85 950.85
- STL 1 906.92 950.92
- FGA 1 907.23 951.23
- TS. 1 907.50 951.50
- FTr 1 907.54 951.54
<none> 906.56 952.56
- STL. 1 908.71 952.71
- X2P 1 909.40 953.40
- FG 1 909.43 953.43
- Age 1 910.25 954.25
- TOV. 1 911.54 955.54
- ORB 1 912.01 956.01
- PF 1 913.32 957.32
- USG. 1 915.63 959.63
- BLK 1 919.05 963.05
- BPM 1 920.01 964.01
- VORP 1 921.76 965.76
- DWS 1 924.33 968.33
- G 1 945.96 989.96
- AST 1 951.25 995.25
- WS 1 998.09 1042.09
Step: AIC=950.58
quinteto ~ Age + G + TS. + FTr + STL. + TOV. + USG. + DWS + WS +
BPM + VORP + FG + FGA + X2P + FTA + PF + ORB + DRB + AST +
STL + BLK
Df Deviance AIC
- DRB 1 906.67 948.67
- FTA 1 906.86 948.86
- STL 1 906.93 948.93
- FGA 1 907.23 949.23
- TS. 1 907.54 949.54
- FTr 1 907.57 949.57
<none> 906.58 950.58
- STL. 1 908.75 950.75
- X2P 1 909.40 951.40
- FG 1 909.46 951.46
- Age 1 910.56 952.56
- TOV. 1 911.56 953.56
- ORB 1 912.04 954.04
- PF 1 913.32 955.32
- USG. 1 917.02 959.02
- BLK 1 919.10 961.10
- BPM 1 920.75 962.75
- VORP 1 922.01 964.01
- DWS 1 924.42 966.42
- AST 1 952.95 994.95
- G 1 961.91 1003.91
- WS 1 998.11 1040.11
Step: AIC=948.67
quinteto ~ Age + G + TS. + FTr + STL. + TOV. + USG. + DWS + WS +
BPM + VORP + FG + FGA + X2P + FTA + PF + ORB + AST + STL +
BLK
Df Deviance AIC
- STL 1 906.96 946.96
- FTA 1 906.97 946.97
- FGA 1 907.36 947.36
- TS. 1 907.60 947.60
- FTr 1 907.64 947.64
<none> 906.67 948.67
- STL. 1 908.85 948.85
- FG 1 909.52 949.52
- X2P 1 909.54 949.54
- Age 1 910.73 950.73
- TOV. 1 911.68 951.68
- PF 1 913.32 953.32
- ORB 1 914.50 954.50
- USG. 1 917.02 957.02
- BLK 1 919.29 959.29
- BPM 1 920.76 960.76
- VORP 1 922.05 962.05
- DWS 1 929.97 969.97
- AST 1 953.15 993.15
- G 1 962.20 1002.20
- WS 1 998.11 1038.11
Step: AIC=946.96
quinteto ~ Age + G + TS. + FTr + STL. + TOV. + USG. + DWS + WS +
BPM + VORP + FG + FGA + X2P + FTA + PF + ORB + AST + BLK
Df Deviance AIC
- FTA 1 907.29 945.29
- FGA 1 907.67 945.67
- FTr 1 907.91 945.91
- TS. 1 907.92 945.92
<none> 906.96 946.96
- STL. 1 909.15 947.15
- X2P 1 909.82 947.82
- FG 1 909.84 947.84
- Age 1 910.82 948.82
- TOV. 1 911.71 949.71
- PF 1 913.44 951.44
- ORB 1 914.76 952.76
- USG. 1 917.32 955.32
- BLK 1 919.29 957.29
- BPM 1 921.09 959.09
- VORP 1 923.15 961.15
- DWS 1 932.44 970.44
- AST 1 961.81 999.81
- G 1 962.27 1000.27
- WS 1 1000.34 1038.34
Step: AIC=945.29
quinteto ~ Age + G + TS. + FTr + STL. + TOV. + USG. + DWS + WS +
BPM + VORP + FG + FGA + X2P + PF + ORB + AST + BLK
Df Deviance AIC
- TS. 1 908.16 944.16
- FGA 1 908.80 944.80
<none> 907.29 945.29
- X2P 1 909.83 945.83
- FG 1 909.84 945.84
- FTr 1 909.91 945.91
- STL. 1 910.05 946.05
- Age 1 910.95 946.95
- PF 1 913.47 949.47
- TOV. 1 913.55 949.55
- ORB 1 915.15 951.15
- BLK 1 919.76 955.76
- USG. 1 920.37 956.37
- BPM 1 923.21 959.21
- VORP 1 924.03 960.03
- DWS 1 933.02 969.02
- AST 1 962.37 998.37
- G 1 962.46 998.46
- WS 1 1034.66 1070.66
Step: AIC=944.16
quinteto ~ Age + G + FTr + STL. + TOV. + USG. + DWS + WS + BPM +
VORP + FG + FGA + X2P + PF + ORB + AST + BLK
Df Deviance AIC
- FG 1 909.89 943.89
- X2P 1 910.00 944.00
- FTr 1 910.01 944.01
<none> 908.16 944.16
- STL. 1 911.01 945.01
- Age 1 911.88 945.88
- FGA 1 912.83 946.83
- PF 1 914.58 948.58
- TOV. 1 915.08 949.08
- ORB 1 915.96 949.96
- USG. 1 920.37 954.37
- BLK 1 920.76 954.76
- BPM 1 924.04 958.04
- VORP 1 924.14 958.14
- DWS 1 934.91 968.91
- AST 1 962.39 996.39
- G 1 968.88 1002.88
- WS 1 1035.67 1069.67
Step: AIC=943.89
quinteto ~ Age + G + FTr + STL. + TOV. + USG. + DWS + WS + BPM +
VORP + FGA + X2P + PF + ORB + AST + BLK
Df Deviance AIC
- X2P 1 910.06 942.06
- FTr 1 911.09 943.09
<none> 909.89 943.89
- STL. 1 913.28 945.28
- Age 1 913.51 945.51
- PF 1 915.75 947.75
- ORB 1 916.18 948.18
- TOV. 1 918.77 950.77
- USG. 1 921.67 953.67
- BLK 1 922.58 954.58
- VORP 1 924.41 956.41
- BPM 1 926.71 958.71
- DWS 1 936.59 968.59
- FGA 1 951.18 983.18
- AST 1 965.50 997.50
- G 1 968.88 1000.88
- WS 1 1073.43 1105.43
Step: AIC=942.06
quinteto ~ Age + G + FTr + STL. + TOV. + USG. + DWS + WS + BPM +
VORP + FGA + PF + ORB + AST + BLK
Df Deviance AIC
- FTr 1 911.26 941.26
<none> 910.06 942.06
- STL. 1 913.44 943.44
- Age 1 913.67 943.67
- PF 1 916.46 946.46
- ORB 1 916.60 946.60
- TOV. 1 918.83 948.83
- USG. 1 921.99 951.99
- BLK 1 922.67 952.67
- VORP 1 925.05 955.05
- BPM 1 926.77 956.77
- DWS 1 937.24 967.24
- AST 1 966.11 996.11
- G 1 969.76 999.76
- FGA 1 984.38 1014.38
- WS 1 1088.96 1118.96
Step: AIC=941.26
quinteto ~ Age + G + STL. + TOV. + USG. + DWS + WS + BPM + VORP +
FGA + PF + ORB + AST + BLK
Df Deviance AIC
<none> 911.26 941.26
- Age 1 914.56 942.56
- STL. 1 915.38 943.38
- PF 1 917.01 945.01
- ORB 1 918.41 946.41
- TOV. 1 920.89 948.89
- BLK 1 924.19 952.19
- USG. 1 925.62 953.62
- VORP 1 927.48 955.48
- BPM 1 929.51 957.51
- DWS 1 937.73 965.73
- AST 1 968.08 996.08
- G 1 971.50 999.50
- FGA 1 984.52 1012.52
- WS 1 1107.48 1135.48
Call: glm(formula = quinteto ~ Age + G + STL. + TOV. + USG. + DWS +
WS + BPM + VORP + FGA + PF + ORB + AST + BLK, family = "binomial",
data = bd)
Coefficients:
(Intercept) Age G STL. TOV. USG. DWS
-10.818575 0.045524 -0.133487 -0.220845 0.083679 0.104698 0.447596
WS BPM VORP FGA PF ORB AST
0.885259 0.166501 -0.498484 0.004465 -0.005797 0.004185 0.004953
BLK
0.008042
Degrees of Freedom: 15359 Total (i.e. Null); 15345 Residual
Null Deviance: 3944
Residual Deviance: 911.3 AIC: 941.3
glmnba.final <- glm(formula = quinteto ~ Age + G + STL. + TOV. + USG. + DWS +
WS + BPM + VORP + FGA + PF + ORB + AST + BLK, family = "binomial",
data = bd)
summary(glmnba.final)
Call:
glm(formula = quinteto ~ Age + G + STL. + TOV. + USG. + DWS +
WS + BPM + VORP + FGA + PF + ORB + AST + BLK, family = "binomial",
data = bd)
Deviance Residuals:
Min 1Q Median 3Q Max
-3.4771 -0.0263 -0.0104 -0.0038 3.1025
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.082e+01 1.289e+00 -8.390 < 2e-16 ***
Age 4.552e-02 2.504e-02 1.818 0.069083 .
G -1.335e-01 1.743e-02 -7.658 1.89e-14 ***
STL. -2.208e-01 1.340e-01 -1.648 0.099253 .
TOV. 8.368e-02 2.052e-02 4.078 4.54e-05 ***
USG. 1.047e-01 2.299e-02 4.554 5.27e-06 ***
DWS 4.476e-01 8.935e-02 5.009 5.46e-07 ***
WS 8.853e-01 7.298e-02 12.130 < 2e-16 ***
BPM 1.665e-01 4.224e-02 3.941 8.10e-05 ***
VORP -4.985e-01 1.253e-01 -3.978 6.96e-05 ***
FGA 4.465e-03 5.135e-04 8.695 < 2e-16 ***
PF -5.797e-03 2.426e-03 -2.390 0.016856 *
ORB 4.185e-03 1.564e-03 2.676 0.007447 **
AST 4.953e-03 6.977e-04 7.099 1.25e-12 ***
BLK 8.042e-03 2.247e-03 3.578 0.000346 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 3944.24 on 15359 degrees of freedom
Residual deviance: 911.26 on 15345 degrees of freedom
AIC: 941.26
Number of Fisher Scoring iterations: 10
nba.prob <- predict(glmnba.final,bdpredict,type="response")
nba.pred <- rep("0_NotInAllNba",length(bdpredict$Player))
nba.pred[nba.prob > 0.5] <- '1'
(mean(nba.prob - bdpredict$quinteto)^2)
[1] 1.420207e-05
table(nba.pred,bdpredict$quinteto)
nba.pred 0 1
0_NotInAllNba 3462 37
1 7 53
(3462 + 53 )/3559
[1] 0.987637
library(pROC)
pred2 <- predict(glmnba.final, type="response")
curv_roc4 <- roc(bd$quinteto,pred2)
plot(curv_roc4)
curv_roc4$auc
Area under the curve: 0.9946
ci(curv_roc4)
95% CI: 0.9934-0.9958 (DeLong)
library(MASS)
qdanba <- qda(formula = quinteto ~ Age + G + STL. + TOV. + USG. + DWS +
WS + BPM + VORP + FGA + PF + ORB + AST + BLK,
data = bd)
qdanba
Call:
qda(quinteto ~ Age + G + STL. + TOV. + USG. + DWS + WS + BPM +
VORP + FGA + PF + ORB + AST + BLK, data = bd)
Prior probabilities of groups:
0 1
0.9718099 0.0281901
Group means:
Age G STL. TOV. USG. DWS WS BPM VORP FGA
0 26.88725 49.59088 1.653038 15.28385 18.71998 1.120620 2.189643 -2.635975 0.4346754 383.9318
1 27.33487 76.13626 1.972979 12.90139 26.87529 4.101848 11.593533 4.939261 5.0092379 1300.2032
PF ORB AST BLK
0 110.3052 60.32304 106.1581 23.16293
1 205.2540 163.26097 377.9677 83.81986
qda.pred <- predict(qdanba, bdpredict)
qda.class <- qda.pred$class
table(qda.class,bdpredict$quinteto)
qda.class 0 1
0 3296 4
1 173 86
mean(qda.class == bdpredict$quinteto)
[1] 0.9502669
c.roc <- roc(bdpredict$quinteto, qda.pred$posterior[,2])
plot(c.roc)
c.roc$auc
Area under the curve: 0.9881
ci(c.roc)
95% CI: 0.9841-0.992 (DeLong)
ldanba <- lda(formula = quinteto ~ Age + G + STL. + TOV. + USG. + DWS +
WS + BPM + VORP + FGA + PF + ORB + AST + BLK,
data = bd)
ldanba
Call:
lda(quinteto ~ Age + G + STL. + TOV. + USG. + DWS + WS + BPM +
VORP + FGA + PF + ORB + AST + BLK, data = bd)
Prior probabilities of groups:
0 1
0.9718099 0.0281901
Group means:
Age G STL. TOV. USG. DWS WS BPM VORP FGA
0 26.88725 49.59088 1.653038 15.28385 18.71998 1.120620 2.189643 -2.635975 0.4346754 383.9318
1 27.33487 76.13626 1.972979 12.90139 26.87529 4.101848 11.593533 4.939261 5.0092379 1300.2032
PF ORB AST BLK
0 110.3052 60.32304 106.1581 23.16293
1 205.2540 163.26097 377.9677 83.81986
Coefficients of linear discriminants:
LD1
Age -0.0017043161
G -0.0016823175
STL. -0.0293856482
TOV. 0.0148408460
USG. 0.0280796224
DWS -0.0187349395
WS 0.2654048167
BPM -0.0110578645
VORP 0.4231109491
FGA 0.0006739763
PF -0.0086617870
ORB -0.0007880899
AST 0.0000610087
BLK 0.0073037155
lda.pred <- predict(ldanba, bdpredict)
lda.class <- lda.pred$class
table(lda.class,bdpredict$quinteto)
lda.class 0 1
0 3422 21
1 47 69
mean(lda.class == bdpredict$quinteto)
[1] 0.9808935
c2.roc <- roc(bdpredict$quinteto, lda.pred$posterior[,2])
plot(c2.roc)
c2.roc$auc
Area under the curve: 0.9901
ci(c2.roc)
95% CI: 0.9862-0.994 (DeLong)
We choose the GLM method as it is the one that best predicts true positives and negatives. On the one hand it is the one that reduces the false positives the most (really what interests us to the mistakes that the model makes), but the false negatives are higher than the other models. We can say that all three models are good, but for the above reasons we will stick with the GLM.
bd15 <- rbind(bd,df_any12,df_any13,df_any14)
bd16 <- rbind(bd15,df_any15)
bd17 <- rbind(bd16,df_any16)
glmnba2015 <- glm(formula = quinteto ~ Age + G + STL. + TOV. + USG. + DWS +
WS + BPM + VORP + FGA + PF + ORB + AST + BLK, family = "binomial",
data = bd15)
nba.prob2015 <- predict(glmnba2015,df_any15,type="response")
nba.pred2015 <- rep("0_NotInAllNba",length(df_any15$Player))
nba.pred2015[nba.prob2015 > 0.5] <- '1'
nba2015 <- sort(nba.prob2015, decreasing=TRUE)
prueba2015 <- nba2015[1:40]
probs <- (prueba2015)*100
nuevodata2015 <- data.frame(probs,df_any15[names(prueba2015),1:5])
glmnba2016 <- glm(formula = quinteto ~ Age + G + STL. + TOV. + USG. + DWS +
WS + BPM + VORP + FGA + PF + ORB + AST + BLK, family = "binomial",
data = bd16)
nba.prob2016 <- predict(glmnba2016,df_any16,type="response")
nba.pred2016 <- rep("0_NotInAllNba",length(df_any16$Player))
nba.pred2016[nba.prob2016 > 0.5] <- '1'
nba2016 <- sort(nba.prob2016, decreasing=TRUE)
prueba2016 <- nba2016[1:40]
probs <- (prueba2016)*100
nuevodata2016 <- data.frame(probs,df_any16[names(prueba2016),1:5])
glmnba2017 <- glm(formula = quinteto ~ Age + G + STL. + TOV. + USG. + DWS +
WS + BPM + VORP + FGA + PF + ORB + AST + BLK, family = "binomial",
data = bd17)
nba.prob2017 <- predict(glmnba2017,df_any17,type="response")
nba.pred2017 <- rep("0_NotInAllNba",length(df_any17$Player))
nba.pred2017[nba.prob2017 > 0.5] <- '1'
nba2017 <- sort(nba.prob2017, decreasing=TRUE)
prueba2017 <- nba2017[1:40]
probs <- (prueba2017)*100
nuevodata2017 <- data.frame(probs,df_any17[names(prueba2017),1:5])
funcion_posicions_2015 <- function(df){
contG = 0
contF = 0
contC = 0
conttotal = 0
playerselection <- c()
for (i in (1:length(df$Player))) {
if (contG < 6 & (df$Pos[i] == 'PG' | df$Pos[i] == 'SG')){
contG = contG + 1
conttotal = conttotal + 1
playerselection[conttotal] <- df$Player[i]
print(playerselection)
}
else if (contF < 5 & (df$Pos[i] == 'SF' | df$Pos[i] == 'PF')){
contF = contF +1
conttotal = conttotal +1
playerselection[conttotal] <- df$Player[i]
print(playerselection)
}
else if (contC < 4 & df$Pos[i] == 'C'){
contC = contC + 1
conttotal = conttotal + 1
playerselection[conttotal] <- df$Player[i]
print(playerselection)
}}
print(playerselection)
df<- df %>% filter(df$Player %in% playerselection)
return(df)
}
We create the dataframe with the predictions of the model, and if the result is correct or not. We have to create a function that chooses (from the players with the highest probability of the model) the positions in the All NBATeam.
funcion_posicions <- function(df){
contG = 0
contF = 0
contC = 0
conttotal = 0
playerselection <- c()
for (i in (1:length(df$Player))) {
if (contG < 6 & (df$Pos[i] == 'PG' | df$Pos[i] == 'SG')){
contG = contG + 1
conttotal = conttotal + 1
playerselection[conttotal] <- df$Player[i]
print(playerselection)
}
else if (contF < 6 & (df$Pos[i] == 'SF' | df$Pos[i] == 'PF')){
contF = contF +1
conttotal = conttotal +1
playerselection[conttotal] <- df$Player[i]
print(playerselection)
}
else if (contC < 3 & df$Pos[i] == 'C'){
contC = contC + 1
conttotal = conttotal + 1
playerselection[conttotal] <- df$Player[i]
print(playerselection)
}}
print(playerselection)
df<- df %>% filter(df$Player %in% playerselection)
return(df)
}
tablaprediccio15 <- funcion_posicions_2015(nuevodata2015)
[1] "James Harden"
[1] "James Harden" "Stephen Curry"
[1] "James Harden" "Stephen Curry" "Chris Paul"
[1] "James Harden" "Stephen Curry" "Chris Paul" "Anthony Davis"
[1] "James Harden" "Stephen Curry" "Chris Paul" "Anthony Davis"
[5] "Russell Westbrook"
[1] "James Harden" "Stephen Curry" "Chris Paul" "Anthony Davis"
[5] "Russell Westbrook" "LeBron James"
[1] "James Harden" "Stephen Curry" "Chris Paul" "Anthony Davis"
[5] "Russell Westbrook" "LeBron James" "Pau Gasol"
[1] "James Harden" "Stephen Curry" "Chris Paul" "Anthony Davis"
[5] "Russell Westbrook" "LeBron James" "Pau Gasol" "LaMarcus Aldridge"
[1] "James Harden" "Stephen Curry" "Chris Paul" "Anthony Davis"
[5] "Russell Westbrook" "LeBron James" "Pau Gasol" "LaMarcus Aldridge"
[9] "Damian Lillard"
[1] "James Harden" "Stephen Curry" "Chris Paul" "Anthony Davis"
[5] "Russell Westbrook" "LeBron James" "Pau Gasol" "LaMarcus Aldridge"
[9] "Damian Lillard" "John Wall"
[1] "James Harden" "Stephen Curry" "Chris Paul" "Anthony Davis"
[5] "Russell Westbrook" "LeBron James" "Pau Gasol" "LaMarcus Aldridge"
[9] "Damian Lillard" "John Wall" "Blake Griffin"
[1] "James Harden" "Stephen Curry" "Chris Paul" "Anthony Davis"
[5] "Russell Westbrook" "LeBron James" "Pau Gasol" "LaMarcus Aldridge"
[9] "Damian Lillard" "John Wall" "Blake Griffin" "Marc Gasol"
[1] "James Harden" "Stephen Curry" "Chris Paul" "Anthony Davis"
[5] "Russell Westbrook" "LeBron James" "Pau Gasol" "LaMarcus Aldridge"
[9] "Damian Lillard" "John Wall" "Blake Griffin" "Marc Gasol"
[13] "Tim Duncan"
[1] "James Harden" "Stephen Curry" "Chris Paul" "Anthony Davis"
[5] "Russell Westbrook" "LeBron James" "Pau Gasol" "LaMarcus Aldridge"
[9] "Damian Lillard" "John Wall" "Blake Griffin" "Marc Gasol"
[13] "Tim Duncan" "DeMarcus Cousins"
[1] "James Harden" "Stephen Curry" "Chris Paul" "Anthony Davis"
[5] "Russell Westbrook" "LeBron James" "Pau Gasol" "LaMarcus Aldridge"
[9] "Damian Lillard" "John Wall" "Blake Griffin" "Marc Gasol"
[13] "Tim Duncan" "DeMarcus Cousins" "DeAndre Jordan"
[1] "James Harden" "Stephen Curry" "Chris Paul" "Anthony Davis"
[5] "Russell Westbrook" "LeBron James" "Pau Gasol" "LaMarcus Aldridge"
[9] "Damian Lillard" "John Wall" "Blake Griffin" "Marc Gasol"
[13] "Tim Duncan" "DeMarcus Cousins" "DeAndre Jordan"
tablaprediccio15$quinteto <- as.logical(tablaprediccio15$quinteto)
tablaresultats15 <- tablaprediccio15 %>% select(Player, Age, Pos, probs, quinteto)
names(tablaresultats15)= c("Player","Age","Position","Probabilitat","Is in?")
tablaresultats15 %>% gt()%>% tab_header(
title = md("Best players 2015"))
| Best players 2015 | ||||
|---|---|---|---|---|
| Player | Age | Position | Probabilitat | Is in? |
| James Harden | 25 | SG | 99.82453 | TRUE |
| Stephen Curry | 26 | PG | 99.46583 | TRUE |
| Chris Paul | 29 | PG | 98.94948 | TRUE |
| Anthony Davis | 21 | PF | 98.68507 | TRUE |
| Russell Westbrook | 26 | PG | 97.67883 | TRUE |
| LeBron James | 30 | SF | 91.07932 | TRUE |
| Pau Gasol | 34 | PF | 79.18571 | TRUE |
| LaMarcus Aldridge | 29 | PF | 71.48565 | TRUE |
| Damian Lillard | 24 | PG | 49.70695 | FALSE |
| John Wall | 24 | PG | 39.83640 | FALSE |
| Blake Griffin | 25 | PF | 36.01446 | TRUE |
| Marc Gasol | 30 | C | 34.85991 | TRUE |
| Tim Duncan | 38 | C | 25.74312 | TRUE |
| DeMarcus Cousins | 24 | C | 21.20210 | TRUE |
| DeAndre Jordan | 26 | C | 17.97916 | TRUE |
no_corresponen2015 <- tablaprediccio15 %>% filter(quinteto %in% 0)
no_corresponen2015
probs quinteto Year Player Pos Age
351 49.70695 FALSE 2015 Damian Lillard PG 24
608 39.83640 FALSE 2015 John Wall PG 24
nuevodata2015 %>% filter(Pos %in% c("PG","SG"))
probs quinteto Year Player Pos Age
250 99.824535 1 2015 James Harden SG 25
136 99.465827 1 2015 Stephen Curry PG 26
448 98.949478 1 2015 Chris Paul PG 29
616 97.678831 1 2015 Russell Westbrook PG 26
351 49.706950 0 2015 Damian Lillard PG 24
608 39.836400 0 2015 John Wall PG 24
88 34.920491 0 2015 Jimmy Butler SG 25
284 32.246938 1 2015 Kyrie Irving PG 22
582 16.396576 1 2015 Klay Thompson SG 24
567 7.883518 0 2015 Jeff Teague PG 26
340 3.052430 0 2015 Ty Lawson PG 27
357 2.954811 0 2015 Kyle Lowry PG 28
602 1.958710 0 2015 Dwyane Wade SG 33
267 1.836709 0 2015 George Hill PG 28
62 1.768488 0 2015 Eric Bledsoe PG 25
122 1.671994 0 2015 Mike Conley PG 27
330 1.539479 0 2015 Brandon Knight PG 23
sustituts2015 <- nuevodata2015 %>% filter(Player %in% c("Kyrie Irving","Klay Thompson"))
sustituts2015
probs quinteto Year Player Pos Age
284 32.24694 1 2015 Kyrie Irving PG 22
582 16.39658 1 2015 Klay Thompson SG 24
correccio2015 <- rbind(no_corresponen2015,sustituts2015)
correccioresultats <- correccio2015 %>% select(Player, Age, Pos, probs, quinteto)
correccioresultats$quinteto <- as.logical(correccioresultats$quinteto)
names(correccioresultats)= c("Player","Age","Position","Probabilitat","Is in?")
correccioresultats %>% gt()%>% tab_header(
title = md("substitutions"))
| substitutions | ||||
|---|---|---|---|---|
| Player | Age | Position | Probabilitat | Is in? |
| Damian Lillard | 24 | PG | 49.70695 | FALSE |
| John Wall | 24 | PG | 39.83640 | FALSE |
| Kyrie Irving | 22 | PG | 32.24694 | TRUE |
| Klay Thompson | 24 | SG | 16.39658 | TRUE |
plotly2015 <- ggplot(data = tablaprediccio15,mapping = aes(x=reorder(Player,probs),y=probs,fill = quinteto)) + geom_bar(stat = "identity")+
theme_bw() + theme(axis.text.x=element_text(angle=90))+labs(title = "Prediction",
subtitle = "2015",
x = "Players",
y = "Probability") + coord_cartesian(ylim = c(10,100))
ggplotly(plotly2015)
plotly2015pos <- ggplot(data = tablaprediccio15,
mapping = aes( x = reorder(Player,probs),y=probs, fill = Pos)) +
geom_bar(stat = "identity") +
scale_fill_manual(values=c("chartreuse","brown1","deepskyblue", "brown3","deepskyblue3")) +
theme_bw() +theme(axis.text.x=element_text(angle=90))+labs(title = "Predicition",
subtitle = "2015",
x = "Players",
y = "Probability") + coord_cartesian(ylim = c(10,100))
ggplotly(plotly2015pos)
Checking the model in 2015, we appreciate that the results obtained seem very accurate. Whereas we have a database with many players every season, in this case 650, manages to predict 13 of the 15 players at the ALL NBA TEAM.
Knowing that voting is subjective depending on the player’s game, and not on his statistics, we note that our model explains these votes with a very high probability of success. In the table of substitutions, these are the players who should be in the quintet (Kyrie Irving and Klay Thompson) replacing those who have not been able to correctly predict our model (Jhon Wall and Damian Lillard). We also show the probabilities that our model gives to these players. That way we can learn a little more about our mistakes.
To make this boxplot we do it with a database of the 30 players who are most likely to be in the quintet. On the right we can see as the median of the players who are going to be in the quintet that year we give a probability of 71.49%, while we give the other players a median of 6.59%. We note that there are 3 outliers who are the players our model predicts will be. One of them does not enter due to the restriction of positions.
In the position graph we can see the division of positions that explains a little more the errors of the model, as John Wall and Damian Lillard (both errors) enter although they have a higher probability than other players in the model, would enter position in the last two places.
###########2016
tablaprediccio16 <- funcion_posicions(nuevodata2016)
[1] "Stephen Curry"
[1] "Stephen Curry" "Russell Westbrook"
[1] "Stephen Curry" "Russell Westbrook" "Kevin Durant"
[1] "Stephen Curry" "Russell Westbrook" "Kevin Durant" "LeBron James"
[1] "Stephen Curry" "Russell Westbrook" "Kevin Durant" "LeBron James"
[5] "James Harden"
[1] "Stephen Curry" "Russell Westbrook" "Kevin Durant" "LeBron James"
[5] "James Harden" "Chris Paul"
[1] "Stephen Curry" "Russell Westbrook" "Kevin Durant" "LeBron James"
[5] "James Harden" "Chris Paul" "Kawhi Leonard"
[1] "Stephen Curry" "Russell Westbrook" "Kevin Durant" "LeBron James"
[5] "James Harden" "Chris Paul" "Kawhi Leonard" "Kyle Lowry"
[1] "Stephen Curry" "Russell Westbrook" "Kevin Durant" "LeBron James"
[5] "James Harden" "Chris Paul" "Kawhi Leonard" "Kyle Lowry"
[9] "Damian Lillard"
[1] "Stephen Curry" "Russell Westbrook" "Kevin Durant" "LeBron James"
[5] "James Harden" "Chris Paul" "Kawhi Leonard" "Kyle Lowry"
[9] "Damian Lillard" "LaMarcus Aldridge"
[1] "Stephen Curry" "Russell Westbrook" "Kevin Durant" "LeBron James"
[5] "James Harden" "Chris Paul" "Kawhi Leonard" "Kyle Lowry"
[9] "Damian Lillard" "LaMarcus Aldridge" "Hassan Whiteside"
[1] "Stephen Curry" "Russell Westbrook" "Kevin Durant" "LeBron James"
[5] "James Harden" "Chris Paul" "Kawhi Leonard" "Kyle Lowry"
[9] "Damian Lillard" "LaMarcus Aldridge" "Hassan Whiteside" "Draymond Green"
[1] "Stephen Curry" "Russell Westbrook" "Kevin Durant" "LeBron James"
[5] "James Harden" "Chris Paul" "Kawhi Leonard" "Kyle Lowry"
[9] "Damian Lillard" "LaMarcus Aldridge" "Hassan Whiteside" "Draymond Green"
[13] "Paul Millsap"
[1] "Stephen Curry" "Russell Westbrook" "Kevin Durant" "LeBron James"
[5] "James Harden" "Chris Paul" "Kawhi Leonard" "Kyle Lowry"
[9] "Damian Lillard" "LaMarcus Aldridge" "Hassan Whiteside" "Draymond Green"
[13] "Paul Millsap" "Anthony Davis"
[1] "Stephen Curry" "Russell Westbrook" "Kevin Durant" "LeBron James"
[5] "James Harden" "Chris Paul" "Kawhi Leonard" "Kyle Lowry"
[9] "Damian Lillard" "LaMarcus Aldridge" "Hassan Whiteside" "Draymond Green"
[13] "Paul Millsap" "Anthony Davis" "DeMarcus Cousins"
[1] "Stephen Curry" "Russell Westbrook" "Kevin Durant" "LeBron James"
[5] "James Harden" "Chris Paul" "Kawhi Leonard" "Kyle Lowry"
[9] "Damian Lillard" "LaMarcus Aldridge" "Hassan Whiteside" "Draymond Green"
[13] "Paul Millsap" "Anthony Davis" "DeMarcus Cousins"
tablaprediccio16$quinteto <- as.logical(tablaprediccio16$quinteto)
tablaresultats16 <- tablaprediccio16 %>% select(Player, Age, Pos, probs, quinteto)
names(tablaresultats16)= c("Player","Age","Position","Probability","Is in?")
tablaresultats16 %>% gt()%>% tab_header(
title = md("Best players 2016"))
| Best players 2016 | ||||
|---|---|---|---|---|
| Player | Age | Position | Probability | Is in? |
| Stephen Curry | 27 | PG | 99.95947 | TRUE |
| Russell Westbrook | 27 | PG | 99.66814 | TRUE |
| Kevin Durant | 27 | SF | 99.59944 | TRUE |
| LeBron James | 31 | SF | 99.27701 | TRUE |
| James Harden | 26 | SG | 97.84001 | FALSE |
| Chris Paul | 30 | PG | 94.96401 | TRUE |
| Kawhi Leonard | 24 | SF | 92.42842 | TRUE |
| Kyle Lowry | 29 | PG | 61.28511 | TRUE |
| Damian Lillard | 25 | PG | 60.52303 | TRUE |
| LaMarcus Aldridge | 30 | PF | 51.54793 | TRUE |
| Hassan Whiteside | 26 | C | 49.35669 | FALSE |
| Draymond Green | 25 | PF | 48.86015 | TRUE |
| Paul Millsap | 30 | PF | 42.52445 | FALSE |
| Anthony Davis | 22 | C | 22.50300 | FALSE |
| DeMarcus Cousins | 25 | C | 22.24952 | TRUE |
no_corresponen2016 <- tablaprediccio16 %>% filter(quinteto %in% 0)
nuevodata2016 %>% filter(Pos %in% c("SG","PF","SF","C"))
probs quinteto Year Player Pos Age
138 99.599444 1 2016 Kevin Durant SF 27
265 99.277015 1 2016 LeBron James SF 31
206 97.840012 0 2016 James Harden SG 26
323 92.428424 1 2016 Kawhi Leonard SF 24
7 51.547933 1 2016 LaMarcus Aldridge PF 30
555 49.356690 0 2016 Hassan Whiteside C 26
189 48.860151 1 2016 Draymond Green PF 25
127 43.630190 0 2016 DeMar DeRozan SG 26
376 42.524450 0 2016 Paul Millsap PF 30
176 35.686584 1 2016 Paul George SF 25
119 22.502996 0 2016 Anthony Davis C 22
107 22.249519 1 2016 DeMarcus Cousins C 25
78 20.457942 0 2016 Jimmy Butler SG 26
173 19.153645 0 2016 Pau Gasol C 35
293 16.846014 1 2016 DeAndre Jordan C 27
246 11.252690 0 2016 Al Horford C 29
165 9.584321 0 2016 Jimmer Fredette SG 26
220 7.914784 0 2016 Gordon Hayward SF 25
23 7.542809 0 2016 Carmelo Anthony SF 31
133 6.070247 1 2016 Andre Drummond C 22
517 5.361564 1 2016 Klay Thompson SG 25
332 4.959915 0 2016 Kevin Love PF 27
153 4.136556 0 2016 Derrick Favors PF 24
541 4.031622 0 2016 Dwyane Wade SG 34
62 3.951863 0 2016 Chris Bosh PF 31
524 3.781190 0 2016 Karl-Anthony Towns C 20
198 3.604408 0 2016 Blake Griffin PF 26
330 2.151448 0 2016 Brook Lopez C 27
539 1.645186 0 2016 Nikola Vucevic C 25
sustituts2016 <- nuevodata2016 %>% filter(Player %in% c("DeAndre Jordan","Andre Drummond","Klay Thompson","Paul George"))
correccio2016 <- rbind(no_corresponen2016,sustituts2016)
correccioresultats2016 <- correccio2016 %>% select(Player, Age, Pos, probs, quinteto)
correccioresultats2016$quinteto <- as.logical(correccioresultats2016$quinteto)
names(correccioresultats2016)= c("Player","Age","Position","Probability","Is in?")
correccioresultats2016 %>% gt()%>% tab_header(
title = md("Substitutions"))
| Substitutions | ||||
|---|---|---|---|---|
| Player | Age | Position | Probability | Is in? |
| James Harden | 26 | SG | 97.840012 | FALSE |
| Hassan Whiteside | 26 | C | 49.356690 | FALSE |
| Paul Millsap | 30 | PF | 42.524450 | FALSE |
| Anthony Davis | 22 | C | 22.502996 | FALSE |
| Paul George | 25 | SF | 35.686584 | TRUE |
| DeAndre Jordan | 27 | C | 16.846014 | TRUE |
| Andre Drummond | 22 | C | 6.070247 | TRUE |
| Klay Thompson | 25 | SG | 5.361564 | TRUE |
############## Graphs
plotly2016 <- ggplot(data = tablaprediccio16,mapping = aes(x=reorder(Player,probs),y=probs,fill = quinteto)) +
geom_bar(stat = "identity")+
theme_bw() +theme(axis.text.x=element_text(angle=90))+labs(title = "Predictions",
subtitle = "2016",
x = "Players",
y = "Probability") + coord_cartesian(ylim = c(10,100))
ggplotly(plotly2016)
plotly2016pos <- ggplot(data = tablaprediccio16,
mapping = aes( x = reorder(Player,probs),y=probs, fill = Pos)) +
geom_bar(stat = "identity") +
scale_fill_manual(values=c("chartreuse","brown1","deepskyblue", "brown3","deepskyblue3")) +
theme_bw() +theme(axis.text.x=element_text(angle=90))+labs(title = "Predictions",
subtitle = "2016",
x = "Players",
y = "Probability") + coord_cartesian(ylim = c(10,100))
ggplotly(plotly2016pos)
This year is the year in which we find the most mistakes, especially focusing on the mistake of James Hardem who gives him a 97.8% probability of belonging to the quintets. Researching a bit about the player we realize that he belongs to the quintets from 2013 to 2019 (with the exception of this year) and since 2014 he always appears in the first quintet.
We note that this year is the year in which the player got the fewest wins (a difference of 14 compared to other years), the mistake is because our model does not consider them. We think that this lack of victories influenced the voting. Although his individual statistics were very prominent. This player would have entered the quintet according to the votes in that year in the NBA, but as we know, there is a restriction of positions, which caused him not to enter.
In the table we can find which players have to be in the quintet replacing the model errors.
This year we have the highest average of players who are not in the quintet, although we can see that the median probability of players who are in the quintet is twice that of those who are not. Therefore we can consider that we make a good prediction.
As we can see in this model, we already have 6 players in the “guard” position, this fact causes James Harden to not be able to enter this year’s ALL NBA TEAM.
#############2017
tablaprediccio17 <- funcion_posicions(nuevodata2017)
[1] "James Harden"
[1] "James Harden" "Russell Westbrook"
[1] "James Harden" "Russell Westbrook" "LeBron James"
[1] "James Harden" "Russell Westbrook" "LeBron James" "Anthony Davis"
[1] "James Harden" "Russell Westbrook" "LeBron James" "Anthony Davis"
[5] "Kawhi Leonard"
[1] "James Harden" "Russell Westbrook" "LeBron James" "Anthony Davis"
[5] "Kawhi Leonard" "Kevin Durant"
[1] "James Harden" "Russell Westbrook" "LeBron James" "Anthony Davis"
[5] "Kawhi Leonard" "Kevin Durant" "Jimmy Butler"
[1] "James Harden" "Russell Westbrook" "LeBron James" "Anthony Davis"
[5] "Kawhi Leonard" "Kevin Durant" "Jimmy Butler" "Stephen Curry"
[1] "James Harden" "Russell Westbrook" "LeBron James" "Anthony Davis"
[5] "Kawhi Leonard" "Kevin Durant" "Jimmy Butler" "Stephen Curry"
[9] "Isaiah Thomas"
[1] "James Harden" "Russell Westbrook" "LeBron James" "Anthony Davis"
[5] "Kawhi Leonard" "Kevin Durant" "Jimmy Butler" "Stephen Curry"
[9] "Isaiah Thomas" "Giannis Antetokounmpo"
[1] "James Harden" "Russell Westbrook" "LeBron James" "Anthony Davis"
[5] "Kawhi Leonard" "Kevin Durant" "Jimmy Butler" "Stephen Curry"
[9] "Isaiah Thomas" "Giannis Antetokounmpo" "Karl-Anthony Towns"
[1] "James Harden" "Russell Westbrook" "LeBron James" "Anthony Davis"
[5] "Kawhi Leonard" "Kevin Durant" "Jimmy Butler" "Stephen Curry"
[9] "Isaiah Thomas" "Giannis Antetokounmpo" "Karl-Anthony Towns" "John Wall"
[1] "James Harden" "Russell Westbrook" "LeBron James" "Anthony Davis"
[5] "Kawhi Leonard" "Kevin Durant" "Jimmy Butler" "Stephen Curry"
[9] "Isaiah Thomas" "Giannis Antetokounmpo" "Karl-Anthony Towns" "John Wall"
[13] "Rudy Gobert"
[1] "James Harden" "Russell Westbrook" "LeBron James" "Anthony Davis"
[5] "Kawhi Leonard" "Kevin Durant" "Jimmy Butler" "Stephen Curry"
[9] "Isaiah Thomas" "Giannis Antetokounmpo" "Karl-Anthony Towns" "John Wall"
[13] "Rudy Gobert" "DeMar DeRozan"
[1] "James Harden" "Russell Westbrook" "LeBron James" "Anthony Davis"
[5] "Kawhi Leonard" "Kevin Durant" "Jimmy Butler" "Stephen Curry"
[9] "Isaiah Thomas" "Giannis Antetokounmpo" "Karl-Anthony Towns" "John Wall"
[13] "Rudy Gobert" "DeMar DeRozan" "Gordon Hayward"
[1] "James Harden" "Russell Westbrook" "LeBron James" "Anthony Davis"
[5] "Kawhi Leonard" "Kevin Durant" "Jimmy Butler" "Stephen Curry"
[9] "Isaiah Thomas" "Giannis Antetokounmpo" "Karl-Anthony Towns" "John Wall"
[13] "Rudy Gobert" "DeMar DeRozan" "Gordon Hayward"
tablaprediccio17$quinteto <- as.logical(tablaprediccio17$quinteto)
tablaresultats17 <- tablaprediccio17 %>% select(Player, Age, Pos, probs, quinteto)
names(tablaresultats17)= c("Player","Age","Position","Probability","Is in?")
tablaresultats17 %>% gt()%>% tab_header(
title = md("Best layers 2017"))
| Best layers 2017 | ||||
|---|---|---|---|---|
| Player | Age | Position | Probability | Is in? |
| James Harden | 27 | PG | 99.90859 | TRUE |
| Russell Westbrook | 28 | PG | 99.90621 | TRUE |
| LeBron James | 32 | SF | 98.82146 | TRUE |
| Anthony Davis | 23 | C | 97.47270 | TRUE |
| Kawhi Leonard | 25 | SF | 97.37003 | TRUE |
| Kevin Durant | 28 | SF | 96.08908 | TRUE |
| Jimmy Butler | 27 | SF | 95.59811 | TRUE |
| Stephen Curry | 28 | PG | 95.06756 | TRUE |
| Isaiah Thomas | 27 | PG | 94.65219 | TRUE |
| Giannis Antetokounmpo | 22 | SF | 88.08650 | TRUE |
| Karl-Anthony Towns | 21 | C | 86.31123 | FALSE |
| John Wall | 26 | PG | 85.78017 | TRUE |
| Rudy Gobert | 24 | C | 75.74142 | TRUE |
| DeMar DeRozan | 27 | SG | 71.95088 | TRUE |
| Gordon Hayward | 26 | SF | 40.50799 | FALSE |
no_corresponen2017 <- tablaprediccio17 %>% filter(quinteto %in% 0)
no_corresponen2017
probs quinteto Year Player Pos Age
530 86.31123 FALSE 2017 Karl-Anthony Towns C 21
226 40.50799 FALSE 2017 Gordon Hayward SF 26
nuevodata2017 %>% filter(Pos %in% c("PF","SF","C"))
probs quinteto Year Player Pos Age
270 98.821462 1 2017 LeBron James SF 32
123 97.472699 1 2017 Anthony Davis C 23
319 97.370031 1 2017 Kawhi Leonard SF 25
144 96.089081 1 2017 Kevin Durant SF 28
80 95.598108 1 2017 Jimmy Butler SF 27
20 88.086502 1 2017 Giannis Antetokounmpo SF 22
530 86.311228 0 2017 Karl-Anthony Towns C 21
190 75.741423 1 2017 Rudy Gobert C 24
562 47.777896 0 2017 Hassan Whiteside C 27
111 47.182194 0 2017 DeMarcus Cousins C 26
226 40.507995 0 2017 Gordon Hayward SF 26
207 16.290151 0 2017 Blake Griffin PF 27
178 13.414475 0 2017 Marc Gasol C 32
294 10.635179 1 2017 DeAndre Jordan C 28
285 8.761690 0 2017 Nikola Jokic C 21
9 7.964123 0 2017 LaMarcus Aldridge PF 31
183 5.072372 0 2017 Paul George SF 26
250 4.244377 0 2017 Dwight Howard C 31
203 4.138469 1 2017 Draymond Green PF 26
333 4.036093 0 2017 Kevin Love PF 28
372 3.296743 0 2017 Paul Millsap PF 31
149 1.640858 0 2017 Joel Embiid C 22
sustituts2017 <- nuevodata2017 %>% filter(Player %in% c("DeAndre Jordan","Draymond Green"))
correccio2017 <- rbind(no_corresponen2017,sustituts2017)
correccioresultats2017 <- correccio2017 %>% select(Player, Age, Pos, probs, quinteto)
correccioresultats2017$quinteto <- as.logical(correccioresultats2017$quinteto)
names(correccioresultats2017)= c("Player","Age","Position","Probability","Is in?")
correccioresultats2017 %>% gt()%>% tab_header(
title = md("Substitutions"))
| Substitutions | ||||
|---|---|---|---|---|
| Player | Age | Position | Probability | Is in? |
| Karl-Anthony Towns | 21 | C | 86.311228 | FALSE |
| Gordon Hayward | 26 | SF | 40.507995 | FALSE |
| DeAndre Jordan | 28 | C | 10.635179 | TRUE |
| Draymond Green | 26 | PF | 4.138469 | TRUE |
############# Graphs
plotly2017 <- ggplot(data = tablaprediccio17,mapping = aes(x=reorder(Player,probs),y=probs,fill = quinteto)) +
geom_bar(stat = "identity")+
theme_bw() +theme(axis.text.x=element_text(angle=90))+labs(title = "Predictions",
subtitle = "2017",
x = "Player",
y = "Probability") + coord_cartesian(ylim = c(0,100))
ggplotly(plotly2017)
plotly2017pos <- ggplot(data = tablaprediccio17,
mapping = aes( x = reorder(Player,probs),y=probs, fill = Pos)) +
geom_bar(stat = "identity") +
scale_fill_manual(values=c("chartreuse","brown1","deepskyblue", "brown3","deepskyblue3")) +
theme_bw() +theme(axis.text.x=element_text(angle=90))+labs(title = "Prediction",
subtitle = "2017",
x = "Player",
y = "Probability") + coord_cartesian(ylim = c(0,100))
ggplotly(plotly2017pos)
This year we can consider that there are many players with a very high probability of belonging to the quintet. We note that there are very few errors. There are two mistakes, and they are not in the top 10.
Karl-Anthony Towns of the Minnesota Timberwolves team, has a total of 31 wins and 51 losses. Being these the minimum of victories of all the predicted players. One thing we can also highlight is that this player was 16th in the quintet positions, with 4 points less than Deandre Jordan who came in 15th.
Gordon Hayward that year made the year in best statistics. It was his only year with more than 20 points per game played. It was the only year he was selected for the NBA All Star.
The boxplot diagram shows a big difference between the two groups, a little even more remarkable than the other years. With averages of 8.76% compared to 95.07%.
Focusing on the outlier we find, we realize that it is Deandre Jordan with a probability of 10.64% and that he occupies the same position as Karl-Anthony Towns (the mistake of before). Deandre Jordan is a player with a very defensive facet, so he did not have very good statistics, but he has a very good reputation in the league. His team scored 20 more victories this year than the Karl-Anthony Towns team, also entering the playoffs at the top of the table.
In the position graph we find it interesting to see how there is no player in the PS position who currently has many changes in the competition. Occupying it to the extent by SF or C players.